10 research outputs found

    Benefit maximizing classification using feature intervals

    Get PDF
    Cataloged from PDF version of article.For a long time, classification algorithms have focused on minimizing the quantity of prediction errors by assuming that each possible error has identical consequences. However, in many real-world situations, this assumption is not convenient. For instance, in a medical diagnosis domain, misdiagnosing a sick patient as healthy is much more serious than its opposite. For this reason, there is a great need for new classification methods that can handle asymmetric cost and benefit constraints of classifications. In this thesis, we discuss cost-sensitive classification concepts and propose a new classification algorithm called Benefit Maximization with Feature Intervals (BMFI) that uses the feature projection based knowledge representation. In the framework of BMFI, we introduce five different voting methods that are shown to be effective over different domains. A number of generalization and pruning methodologies based on benefits of classification are implemented and experimented. Empirical evaluation of the methods has shown that BMFI exhibits promising performance results compared to recent wrapper cost-sensitive algorithms, despite the fact that classifier performance is highly dependent on the benefit constraints and class distributions in the domain. In order to evaluate costsensitive classification techniques, we describe a new metric, namely benefit accuracy which computes the relative accuracy of the total benefit obtained with respect to the maximum possible benefit achievable in the domain.İkizler, NazlıM.S

    Understanding human motion : recognition and retrieval of human activities

    Get PDF
    Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2008.Thesis (Ph.D.) -- Bilkent University, 2008.Includes bibliographical references leaves 111-121.Within the ever-growing video archives is a vast amount of interesting information regarding human action/activities. In this thesis, we approach the problem of extracting this information and understanding human motion from a computer vision perspective. We propose solutions for two distinct scenarios, ordered from simple to complex. In the first scenario, we deal with the problem of single action recognition in relatively simple settings. We believe that human pose encapsulates many useful clues for recognizing the ongoing action, and we can represent this shape information for 2D single actions in very compact forms, before going into details of complex modeling. We show that high-accuracy single human action recognition is possible 1) using spatial oriented histograms of rectangular regions when the silhouette is extractable, 2) using the distribution of boundary-fitted lines when the silhouette information is missing. We demonstrate that, inside videos, we can further improve recognition accuracy by means of adding local and global motion information. We also show that within a discriminative framework, shape information is quite useful even in the case of human action recognition in still images. Our second scenario involves recognition and retrieval of complex human activities within more complicated settings, like the presence of changing background and viewpoints. We describe a method of representing human activities in 3D that allows a collection of motions to be queried without examples, using a simple and effective query language. Our approach is based on units of activity at segments of the body, that can be composed across time and across the body to produce complex queries. The presence of search units is inferred automatically by tracking the body, lifting the tracks to 3D and comparing to models trained using motion capture data. Our models of short time scale limb behaviour are built using labelled motion capture set. Our query language makes use of finite state automata and requires simple text encoding and no visual examples. We show results for a large range of queries applied to a collection of complex motion and activity. We compare with discriminative methods applied to tracker data; our method offers significantly improved performance. We show experimental evidence that our method is robust to view direction and is unaffected by some important changes of clothing.İkizler, NazlıPh.D

    Searching video for complex activities with finite state models

    Get PDF
    We describe a method of representing human activities that allows a collection of motions to be queried without examples, using a simple and effective query language. Our approach is based on units of activity at segments of the body, that can be composed across space and across the body to produce complex queries. The presence of search units is inferred automatically by tracking the body, lifting the tracks to 3D and comparing to models trained using motion capture data. We show results for a large range of queries applied to a collection of complex motion and activity. Our models of short time scale limb behaviour are built using labelled motion capture set. We compare with discriminative methods applied to tracker data; our method offers significantly improved performance. We show experimental evidence that our method is robust to view direction and is unaffected by the changes of clothing

    P.: Person search made easy

    No full text
    Abstract. In this study, we present a method to extensively reduce the number of retrieved images and increase the retrieval performance for the person queries on the broadcast news videos. A multi-modal approach which integrates face and text information is proposed. A state-of-theart face detection algorithm is improved using a skin color based method to eliminate the false alarms. This pruned set is clustered to group the similar faces and representative faces are selected from each cluster to be provided to the user. For six person queries of TRECVID2004, on the average, the retrieval rate is increased from 8 % to around 50%, and the number of images that the user has to inspect are reduced from hundreds and thousands to tens.

    Human action recognition using distribution of oriented rectangular patches

    No full text
    We describe a “bag-of-rectangles ” method for representing and recognizing human actions in videos. In this method, each human pose in an action sequence is represented by oriented rectangular patches extracted over the whole body. Then, spatial oriented histograms are formed to represent the distribution of these rectangular patches. In order to carry the information from the spatial domain described by the bag-of-rectangles descriptor to temporal domain for recognition of the actions, four different methods are proposed. These are namely, (i) frame by frame voting, which recognizes the actions by matching the descriptors of each frame, (ii) global histogramming, which extends the idea of Motion Energy Image proposed by Bobick and Davis by rectangular patches, (iii) a classifier based approach using SVMs, and (iv) adaptation of Dynamic Time Warping on the temporal representation of the descriptor. The detailed experiments are carried out on the action dataset of Blank et. al. High success rates (100%) prove that with a very simple and compact representation, we can achieve robust recognition of human actions, compared to complex representations

    Görsel Öge Tanıma Modellerinin Eksik Gözetimli Öğrenimi

    No full text
    Derin öğrenme tabanlı modellerdeki gelişmeleri takiben, yapay zeka çatısı altındaki pek çok araştırma alanında önemli gelişmeler sağlanmıştır. Bu alanlar içerisinde en hızlı gelişim gösteren alanlardan biri bilgisayarlı görü olmuştur. Derin öğrenme öncesi dönemle kıyasladığımızda, görüntü sınıflandırma, nesne tanıma, nesne tespiti, sahne tanıma, görüntü segmentasyonu, görüntü özetleme ve diğer pek çok bilgisayarlı görü probleminde çok büyük gelişmeler sağlandığı göze çarpmaktadır. Derin öğrenme dönemi makine öğrenmesi gelişmeleri üç ana faktör üzerine kuruludur. Birincisi, geniş çaplı etiketli veri kümelerinin toplanması ve yaygınlaşması; ikincisi, geniş çaplı eğitim kümelerinden etkin yararlanmayı sağlayan derin mimarili mimari tasarımlarındaki gelişmeler; ve üçüncüsü, çok parametreli derin öğrenme modellerinin eğitimini mümkün kılan hesaplama alt yapılarındaki donanımsal ilerlemelerdir. Özellikle bilgisayarlı görü alanında daha kapsamlı ve insansı algılama modellerinin geliştirilmesinde, yukarıda bahsedilen üç faktörden birincisi olan geniş çaplı ve etiketli eğitim verisi toplama ihtiyacı önemli bir engel olarak ortaya çıkmaktadır. Nesne tanıma, nesne tespiti, vb. pekçok problemde, hem etiketli veri toplama süreci çok zaman alan ve/veya masrafa yol açan bir işlem olmaktadır, hem de bu durumun yarattığı pratik engellerden ötürü modellerin anlamsal olarak dar bir kapsama sahip olmasına yol açmaktadır. Bu projedeki merkezi amaç, makine öğrenmesi yaklaşımlarındaki bu veri gereksiniminin yarattığı engelleri aşmaya yönelik çözümler üretmektir. Bu amaca yönelik çalışmalar, üç ana başlık altında toplanmıştır: 1) sıfır örnekle öğrenme, 2) az ve/veya kısıtlı etiketli örnek olan durumlarda öğrenme ve 3) görece kolay toplanabilen zayıf etiketler üzerinden makine öğrenmesi. Ayrıca, sıfır örnekle öğrenmede kullanılan tekniklerden esinlenilerek, yeni görsel stilleri sentezleme ve hassas verilerin gizliliğini sağlamaya yönelik dağıtık öğrenmeye yönelik yenilikçi yaklaşımlar geliştirilmiştir. Proje dahilinde yürütülen çalışmalar ile çok sayıda yenilikçi yaklaşım ve matematiksel model ortaya konmuştur. Ayrıca, bu çalışmalar kapsamında görüntü sınıflandırma, nesne tespiti, uzaktan algılama nesne tanıma, işaret dili tanıma, kişi tanıma gibi çok sayıda önemli bilgisayarlı görü problemindeki uygulamaları üzerinde de çalışılmıştır. Yaklaşımların detaylı deneysel analizleri yapılmış ve çoğu yaklaşımda alanının önemli yöntemlerine kıyasla önemli ilerlemeler elde edilebildiği gösterilmiştir. Elde edilen sonuçlar çok sayıda uluslararası bildiri ve makale yayını aracılığı ile literatüre kazandırılmıştır

    Zero-Shot Object Detection by Hybrid Region Embedding

    No full text
    Object detection is considered as one of the most challenging problems in computer vision, since it requires correct prediction of both classes and locations of objects in images. In this study, we define a more difficult scenario, namely zero-shot object detection (ZSD) where no visual training data is available for some of the target object classes. We present a novel approach to tackle this ZSD problem, where a convex combination of embeddings are used in conjunction with a detection framework. For evaluation of ZSD methods, we propose a simple dataset constructed from Fashion-MNIST images and also a custom zero-shot split for the Pascal VOC detection challenge. The experimental results suggest that our method yields promising results for ZSD

    Zero-Shot Sign Language Recognition: Can Textual Data Uncover Sign Languages?

    No full text
    We introduce the problem of zero-shot sign language recognition (ZSSLR), where the goal is to leverage models learned over the seen sign class examples to recognize the instances of unseen signs. To this end, we propose to utilize the readily available descriptions in sign language dictionaries as an intermediate-level semantic representation for knowledge transfer. We introduce a new benchmark dataset called ASL-Text that consists of 250 sign language classes and their accompanying textual descriptions. Compared to the ZSL datasets in other domains (such as object recognition), our dataset consists of limited number of training examples for a large number of classes, which imposes a significant challenge. We propose a framework that operates over the body and hand regions by means of 3D-CNNs, and models longer temporal relationships via bidirectional LSTMs. By leveraging the descriptive text embeddings along with these spatio-temporal representations within a zero-shot learning framework, we show that textual data can indeed be useful in uncovering sign languages. We anticipate that the introduced approach and the accompanying dataset will provide a basis for further exploration of this new zero-shot learning problem
    corecore